Chip 1997 March

home *** CD-ROM | disk | FTP | other *** search

/ Chip 1997 March / CHIP Mart 1997.iso / SesProg / ESPCH10.ZIP / MANUAL.DOC < prev next >

Wrap

Text File | 1995-04-14 | 9.8 KB | 293 lines

ECHOSPEECH High Quality Speech Compression for Multimedia Shareware Manual Echo Speech Corporation 6460 Via Real Carpinteria, CA 93013 Phone: 805/684-4593 FAX: 805/684-6628 Copyright 1994-1995 Echo Speech Corporation. All Rights Reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any other language or computer language in whole or in part, in any form or by any means, whether it be electronic, mechanical, magnetic, optical, manual or otherwise, without prior written consent of Echo Speech Corporation. Echo Speech Corporation disclaims all warranties as to this software, whether express or implied, including without limitation any implied warranties of merchantability, fitness for a particular purpose, functionality, data integrity or protection. ECHOSPEECH is a trademark of Echo Speech Corporation. MS-DOS and Microsoft Windows are trademarks of Microsoft Corporation. See the file REGISTER.DOC for information about the benefits of registering ECHOSPEECH. What's ECHOSPEECH? ECHOSPEECH is the first speech compression algorithm designed primarily for multimedia applications. Other speech compression algorithms were designed for communications systems, like digital cellular phones or military radios, so they only provide the same narrow frequency response as a telephone, and they generally need special hardware to run. We at Echo Speech Corporation are in the multimedia business, so we designed ECHOSPEECH -- efficient, high-quality, cross-platform speech compression for multimedia. ECHOSPEECH preserves frequencies up to 5500 Hz, so the output sounds crisp, clear and understandable. ECHOSPEECH can play back speech in real time on anything faster than a 386SX/16, with no need for a math coprocessor. ECHOSPEECH decompression is also available for the Macintosh, making it the first cross-platform speech compression package. ECHOSPEECH reduces the amount of storage space required for 16- bit speech by a factor of more than 13 to 1, with very little noticeable degradation in perceived quality. For example, for 16-bit speech sampled at 11.025 KHz, ECHOSPEECH reduces the storage required for a second of speech from 22,050 bytes to 1,650 bytes. Installation This shareware version of ECHOSPEECH consists of two main programs which run under MS-DOS. One program is a speech compression program called WAV2ES.EXE, which reads a Microsoft Windows .wav file and generates a compressed speech file which also has the file name extension .WAV. The other program is called ES2WAV.EXE. It reads the compressed .WAV file and generates an uncompressed output file in .WAV format. Although the file name extension of the compressed speech file is .WAV, and the file complies with Microsoft's .WAV file format, it contains compressed ECHOSPEECH data, and must be manually decompressed with ES2WAV.EXE before it can be played back. This decompression can be done automatically "on the fly" with the ECHOSPEECH Audio Compression Manager files which are part of the registered ECHOSPEECH shareware package. The distribution contains an install program INSTALL.BAT the program files WAV2ES.X87 WAV2ES.EMU ES2WAV.EXE ISFPP.EXE and the documentation files LICENSE.DOC REGISTER.DOC MANUAL.DOC ORDER.DOC You can place the ECHOSPEECH files in any convenient subdirectory, or you can make a new subdirectory, like "\ECHO", for example, and put them there. To install ECHOSPEECH, connect to the subdirectory where you put the ECHOSPEECH files and run the installation program, INSTALL.BAT. INSTALL will remind you that this version of ECHOSPEECH is shareware, determine whether or not your system has a math coprocessor, and copy the distribution files appropriately. The small program ISFPP.EXE is used by INSTALL.BAT to detect the presence or absence of a math coprocessor. Running ECHOSPEECH Compressing a speech file: To convert a Windows .WAV file containing speech sampled at 11.025 KHz to a compressed ECHOSPEECH file, type the following command: wav2es infile [outfile] wav2es - is the ECHOSPEECH coder program name. The installation will have created the proper version of this program based on whether or not your system has a math coprocessor. infile - is the path to the input file to be compressed. This file must be a Windows .WAV file which has been recorded at 11.025 KHz. The .WAV extension is an optional part of the file name on the command line and .WAV will be assumed automatically by the wav2es program in the absence of another extension. [outfile] - is the optional output file name. If this argument is omitted from the command line, wav2es will use the complete infile path and file name, and will begin the output file name with the character "_". For example, the command: wav2es c:\speech\sample will cause the wav2es program to compress a file called sample.wav which is located in the subdirectory c:\speech. The compressed file will be saved as c:\speech\_sample.wav. The wav2es program will always verify before overwriting an existing file and give you the option of entering an alternative file name. Decompressing a speech file: To convert a compressed ECHOSPEECH file to a decompressed Windows .WAV file, type the following command: es2wav [/8] infile [outfile] es2wav - is the ECHOSPEECH decoder program. [/8] - This optional command line parameter tells the es2wav program to create an 8-bit output file (see the warning regarding 8-bit files below). If you omit this parameter es2wav will create a 16-bit file. infile - is the path to the input file to be decoded. This file must be a Windows .WAV file which has been previously coded by the wav2es program. You can omit the .WAV extension as a part of the infile expression. [outfile] - is the optional output file name. If this argument is omitted from the command line, es2wav will use the complete infile path, removing any initial "_" character from the file name, as the output file name. For example, the command: es2wav /8 c:\sample\_speech Will cause the es2wav program to decompress the file c:\sample\_speech.wav and store the decompressed file as c:\sample\speech.wav in 8-bit format. If the original speech.wav file still exists in the c:\sample subdirectory, es2wav will ask you if it can be overwritten, and if not it will give you the opportunity to supply an alternate file name. A Warning About 8-Bit Speech ECHOSPEECH will compress speech files consisting of 8-bit speech samples, and generate files with 8-bit speech samples when it uncompresses. However, ECHOSPEECH and other speech compression algorithms work much better on files of 16-bit speech samples. If you have a 16-bit sound card, the results will be much better if you record the speech using 16-bit samples rather than 8-bit samples, and if you use 16-bit samples for the uncompressed speech output from ECHOSPEECH. A Word About ECHOSPEECH Frames ECHOSPEECH compresses and uncompresses speech in units called "frames." An ECHOSPEECH "frame" is 147 samples long. Why did we pick 147 samples? Because ECHOSPEECH was designed to process 75 frames of speech per second, which corresponds to 147 samples per frame. A rate of 75 frames per second works well with CD-ROMs and other media. That means that ECHOSPEECH will ignore up to 146 samples at the end of the input raw speech file, and that output speech files will contain a multiple of 147 samples of speech data. Getting The Best Results With ECHOSPEECH Remember: "garbage in, landfill out." The better the input, the better job ECHOSPEECH can do in retaining the quality of the speech. o Use good audio recording techniques. Try to record speech with as little background noise, hum, distortion and other extraneous sound as possible. Watch the gain and don't clip the speech. Use a 16-bit sound card for recording if at all possible. o Avoid multiple voices. ECHOSPEECH can only handle one person speaking at a time. o If possible, don't use speech which has previously been compressed and decompressed. Like multiple generations of tape recordings, some quality will be lost each time. o Don't normalize the speech to full range -- this can occasionally cause glitches in the output speech. 90% of full range is almost always OK. o Unlike some speech compression packages, ECHOSPEECH works just fine on high-piched voices -- don't hesitate to use them. o ECHOSPEECH also works well on speech which has had its dynamic range compressed. o ECHOSPEECH may work on some non-speech sounds, and if you like what you get, that's great, but ECHOSPEECH is a speech coder, not an audio coder. Possible Problems Even though a lot of effort has been spent perfecting ECHOSPEECH, an occasional minor but audible glitch in the output speech is possible Some glitches are due to the fact that ECHOSPEECH processes speech in units of frames. Audible glitches can sometimes occur when the speech waveform is changing quickly at or near a boundary between frames. Many of these can be eliminated simply by adding or removing a number of zero samples (say 50 to 75) at the beginning of the speech file. Other glitches may result when the input speech has been clipped or otherwise distorted. Comments about ECHOSPEECH can be sent to comment@echospeech.com